Less-redundant Text Summarization using Ensemble Clustering Algorithm based on GA and PSO

نویسندگان

  • JUNG SONG LEE
  • HAN HEE HAHM
  • SOON CHEOL PARK
چکیده

In this paper, a novel text clustering technique is proposed to summarize text documents. The clustering method, so called ‘Ensemble Clustering Method’, combines both genetic algorithms (GA) and particle swarm optimization (PSO) efficiently and automatically to get the best clustering results. The summarization with this clustering method is to effectively avoid the redundancy in the summarized document and to show the good summarizing results, extracting the most significant and non-redundant sentence from clustering sentences of a document. We tested this technique with various text documents in the open benchmark datasets, DUC01 and DUC02. To evaluate the performances, we used F-measure and ROUGE. The experimental results show that the performance capability of our method is about 11% to 24% better than other summarization algorithms. Key-Words: Text Summarization; Extractive Summarization; Ensemble Clustering; Genetic Algorithms; Particle Swarm Optimization

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Summarization Using Cuckoo Search Optimization Algorithm

Today, with rapid growth of the World Wide Web and creation of Internet sites and online text resources, text summarization issue is highly attended by various researchers. Extractive-based text summarization is an important summarization method which is included of selecting the top representative sentences from the input document. When, we are facing into large data volume documents, the extr...

متن کامل

Discrete PSO with GA Operators for Document Clustering

The paper presents Discrete PSO algorithm for document clustering problems. This algorithm is hybrid of PSO with GA operators. The proposed system is based on population-based heuristic search technique, which can be used to solve combinatorial optimization problems, modeled on the concepts of cultural and social rules derived from the analysis of the swarm intelligence (PSO) with GA operators ...

متن کامل

Filter-Wrapper Approach to Feature Selection Using PSO-GA for Arabic Document Classification with Naive Bayes Multinomial

Text categorization and feature selection are two of the many text data mining problems. In text categorization, the document that contains a collection of text will be changed to the dataset format, the dataset that consists of features and class, words become features and categories of documents become class on this dataset. The number of features that too many can cause a decrease in perform...

متن کامل

High-Dimensional Unsupervised Active Learning Method

In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...

متن کامل

Biogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization

    Given the increasing number of documents, sites, online sources, and the users’ desire to quickly access information, automatic textual summarization has caught the attention of many researchers in this field. Researchers have presented different methods for text summarization as well as a useful summary of those texts including relevant document sentences. This study select...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017